Automatic extraction and evaluation of MWE
نویسندگان
چکیده
This short paper aims at presenting a method for automatically extracting and evaluating MWE in the Europarl corpus. For this purpose we make use of mwetoolkit and utilize its output to find rules for the automatic evaluation of MWE. We then developed an XML parser to evaluate MWE candidates against those rules and also against online dictionaries. A sample of the results was manually evaluated by linguists and we had 87% of precision.
منابع مشابه
Project proposal Automatic extraction and evaluation of MWE: adapting method to French Language Technology: Research and Development
Our project is based on the theme of Multi Word Expressions (MWE) we will focus on the problem of extraction. This task is important for improving lexical resources used for tasks such as tokenization, parsing or translation. In our study we will work on a French corpus. Our aim will be to not only select but also validate automatically which candidates are the true ones. If we have time we wil...
متن کاملExtracting Multiword Expressions With A Semantic Tagger
Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge-based approaches have been proposed and tested, efficient MWE extraction still remains an unsolved issue. In this paper, we present our research work in which we tested approaching the MWE issue using a semantic field ann...
متن کاملClustering-based Approach to Multiword Expression Extraction and Ranking
We present a domain-independent clusteringbased approach for automatic extraction of multiword expressions (MWEs). The method combines statistical information from a general-purpose corpus and texts from Wikipedia articles. We incorporate association measures via dimensions of data points to cluster MWEs and then compute the ranking score for each MWE based on the closest exemplar assigned to a...
متن کاملRe-examining Automatic Keyphrase Extraction Approaches in Scientific Articles
We tackle two major issues in automatic keyphrase extraction using scientific articles: candidate selection and feature engineering. To develop an efficient candidate selection method, we analyze the nature and variation of keyphrases and then select candidates using regular expressions. Secondly, we re-examine the existing features broadly used for the supervised approach, exploring different ...
متن کاملAutomatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کامل